Non-Normal Mixtures of Experts
نویسنده
چکیده
Abstract Mixture of Experts (MoE) is a popular framework for modeling heterogeneity in data for regression, classification and clustering. For continuous data which we consider here in the context of regression and cluster analysis, MoE usually use normal experts, that is, expert components following the Gaussian distribution. However, for a set of data containing a group or groups of observations with asymmetric behavior, heavy tails or atypical observations, the use of normal experts may be unsuitable and can unduly affect the fit of the MoE model. In this paper, we introduce new non-normal mixture of experts (NNMoE) which can deal with these issues regarding possibly skewed, heavy-tailed data and with outliers. The proposed models are the skew-normal MoE and the robust tMoE and skew tMoE, respectively named SNMoE, TMoE and STMoE. We develop dedicated expectation-maximization (EM) and expectation conditional maximization (ECM) algorithms to estimate the parameters of the proposed models by monotonically maximizing the observed data log-likelihood. We describe how the presented models can be used in prediction and in model-based clustering of regression data. Numerical experiments carried out on simulated data show the effectiveness and the robustness of the proposed models in terms modeling non-linear regression functions as well as in model-based clustering. Then, to show their usefulness for practical applications, the proposed models are applied to the real-world data of tone perception for musical data analysis, and the one of temperature anomalies for the analysis of climate change data. keywords: mixture of experts, skew normal distribution, t distribution, skew t distribution, EM algorithm, ECM algorithm, non-linear regression, model-based clustering
منابع مشابه
On the asymptotic normality of hierarchical mixtures-of-experts for generalized linear models
| In the class of hierarchical mixtures-of-experts (HME) models, \experts" in the exponential family with generalized linear mean functions of the form (+ x T) are mixed, according to a set of local weights called the \gating functions" depending on the predictor x. Here () is the inverse link function. We provide regularity conditions on the experts and on the gating functions under which the ...
متن کاملRobust mixture of experts modeling using the skew $t$ distribution
Mixture of Experts (MoE) is a popular framework in the fields of statistics and machine learning for modeling heterogeneity in data for regression, classification and clustering. MoE for continuous data are usually based on the normal distribution. However, it is known that for data with asymmetric behavior, heavy tails and atypical observations, the use of the normal distribution is unsuitable...
متن کاملOn the identifiability of mixtures-of-experts
In mixtures-of-experts (ME) models, "experts" of generalized linear models are combined, according to a set of local weights called the "gating function". The invariant transformations of the ME probability density functions include the permutations of the expert labels and the translations of the parameters in the gating functions. Under certain conditions, we show that the ME systems are iden...
متن کاملBayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Application to Speech Recognition
Machine classi cation of acoustic waveforms as speech events is often di cult due to context-dependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixtures-of-experts and hierarchical mixtures-of-experts models. The statistical model underlying the systems is a mixture model in which both ...
متن کاملMixture of experts architectures for neural networks as a special case of conditional expectation formula
Recently a new interesting architecture of neural networks called “mixture of experts” has been proposed as a tool of real multivariate approximation or classification. It is shown that, in some cases, the underlying problem of prediction can be solved by estimating the joint probability density of involved variables. Assuming the model of Gaussian mixtures we can explictly write the optimal mi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1506.06707 شماره
صفحات -
تاریخ انتشار 2015